Skip to content

[V1] Logits processors extensibility #19912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 336 commits into
base: main
Choose a base branch
from

Conversation

afeldman-nm
Copy link
Contributor

@afeldman-nm afeldman-nm commented Jun 20, 2025

Purpose

Enable V1 logits processors support to be extended with custom logits processors.

New Python interface for custom logitsprocs

    # Specify logitproc by entrypoint
    llm = LLM(
        model="facebook/opt-125m",
        logits_processors_entrypoints=["dummy_logitproc"],
    )

    # Specify logitproc by fully-qualified name
    llm = LLM(
        model="facebook/opt-125m",
        logits_processors_fqns=["vllm.v1.sample.logits_processor.impls:DummyLogitsProcessor"],
    )

New CLI interface for custom logitsprocs

# Engine CLI args for specifying logitsprocs by entrypoint
--logits-processors-entrypoints=dummy_logitproc,<other logitproc>,...

# Engine CLI args for specifying logitsprocs by fully-qualified name
--logits-processors-fqns=vllm.v1.sample.logits_processor.impls:DummyLogitsProcessor,<other logitproc>,...

Test Plan

(WIP)

  • Configuring a contrived custom logits proc
  • E2E tests using REST API and Python interfaces
  • Wrap a V0-style logits proc to create a V1-style logits proc

Test Result

WIP

(Optional) Documentation Update

WIP

Fixes #17799
Fixes #12678

Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
@mergify mergify bot added ci/build and removed needs-rebase labels Jul 16, 2025
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeldman-nm it's looking a lot better now. A few remaining comments but getting close I think.

Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeldman-nm

@@ -2067,7 +2069,7 @@ def _dummy_sampler_run(
output_token_ids=[[] for _ in range(num_reqs)],
allowed_token_ids_mask=None,
bad_words_token_ids={},
logitsprocs=LogitsProcessorManager(),
logitsprocs=LogitsProcessors(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this should contain real LPs since they contain equivalent sampling tensors. Not necessarily needs to be addressed in this PR though.

Comment on lines +133 to +142
argmax_invariant: list["LogitsProcessor"] = field(
default_factory=list, init=False) # argmax-invariant logitsprocs
non_argmax_invariant: list["LogitsProcessor"] = field(
default_factory=list, init=False) # non-argmax-invariant logitsprocs

def __init__(
self,
logitsprocs: Optional[Iterator["LogitsProcessor"]] = None) -> None:
self.argmax_invariant = []
self.non_argmax_invariant = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these fields need to be declared here

Suggested change
argmax_invariant: list["LogitsProcessor"] = field(
default_factory=list, init=False) # argmax-invariant logitsprocs
non_argmax_invariant: list["LogitsProcessor"] = field(
default_factory=list, init=False) # non-argmax-invariant logitsprocs
def __init__(
self,
logitsprocs: Optional[Iterator["LogitsProcessor"]] = None) -> None:
self.argmax_invariant = []
self.non_argmax_invariant = []
def __init__(
self,
logitsprocs: Optional[Iterator["LogitsProcessor"]] = None) -> None:
self.argmax_invariant: list["LogitsProcessor"] = []
self.non_argmax_invariant: list["LogitsProcessor"] = []

Comment on lines +231 to +233
# Build logits processors. If specified by user, load custom
# logitsprocs constructors.
self.logitsprocs = logitsprocs or LogitsProcessors(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the comment is needed or accurate here anymore?

Suggested change
# Build logits processors. If specified by user, load custom
# logitsprocs constructors.
self.logitsprocs = logitsprocs or LogitsProcessors(None)
self.logitsprocs = logitsprocs or LogitsProcessors()

SWAP = 1


# (index, params, output_tok_ids, prompt_tok_ids) tuples for new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be a bit more of a logical ordering to have prompt token ids first?

# Every other scenario is a different way of loading a custom logitproc
_run_test(kwargs, logitproc_loaded=False)
return
elif logitproc_source == CustomLogitprocSource.LOGITPROC_SOURCE_ENTRYPOINT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif logitproc_source == CustomLogitprocSource.LOGITPROC_SOURCE_ENTRYPOINT:
if logitproc_source == CustomLogitprocSource.LOGITPROC_SOURCE_ENTRYPOINT:


def __init__(self, vllm_config: "VllmConfig", device: torch.device,
is_pin_memory: bool):
self.req_info = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.req_info = {}
self.req_info: dict[int, SamplingParams] = {}


# (index, params, output_tok_ids, prompt_tok_ids) tuples for new
# requests added to the batch.
AddedRequest = tuple[int, Union[SamplingParams, PoolingParams], list[int],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to support LP for pooling requests? Maybe we should just keep this as SamplingParams for now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend performance Performance-related issues speculative-decoding v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC]: Logits processor extensibility [Bug]: V1 engine ignores logits processors and min-p sampling
4 participants